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Abstract — Ultra-large scale (ULS) systems are becoming perva- 
sive. They are inherently complex, which makes their design and 
control a challenge for traditional methods. Here we propose 
the design and analysis of ULS systems using measures of 
complexity, emergence, self-organization, and homeostasis based 
on information theory. We evaluate the proposal with a ULS 
computing system provided with genetic adaptation mechanisms. 
We show the evolution of the system with stable and also changing 
workload, using different fitness functions. When the adaptive 
plan forces the system to converge to a predefined performance 
level, the nodes may result in highly unstable configurations, that 
correspond to a high variance in time of the measured complexity. 
Conversely, if the adaptive plan is less "aggressive", the system 
may be more stable, but the optimal performance may not be 
achieved. 

Index Terms — ultra-large-scale system; peer-to-peer; evolution; 
complexity; information theory 

I. Introduction 

Ultra-large-scale (ULS) systems are the result of the in- 
terconnection of heterogeneous systems — characterized by 
decentralized goals and control — that as a whole exhibit 
one or more properties (i.e. behavior) which are not easily 
inferred from the properties of the individual parts. ULS are 
complex systems, since the interactions of their components 
determine their future state and that of the system |11|. This 
interconnectedness limits the predictability of ULS, making 
traditional methods that rely on prediction inadequate iflOl . 

Examples of ULS systems are the Internet, healthcare 
infrastructures, e-markets, global ambient intelligence sys- 
tems, and distributed high-performance computing facilities. 
To overcome the rapidly growing complexity of their man- 
agement, and to reduce the barrier that complexity poses to 
further growth, a variety of architectural frameworks based 
on "self-regulating" components has been proposed |19|. 
Adaptive plans may turn ULS systems into a more efficient, 
environment-driven systems, provided that the adaptive plan 
itself does not introduce further complexity and instabilities 
0. 

In this paper we propose the evaluation of a distributed evo- 
lutionary strategy, by means of the complexity measurement 
principles introduced by Gershenson and Fernandez in |9j . 

There are dozens of measures of complexity. Several of 
them are based on information theory 1181 . This is conve- 
nient because anything can be measured in terms of infor- 
mation, thus these measures can be applied to any studied 
phenomenon. Some measures of complexity correlate with 



"disorder" or "chaos", making random phenomena to have 
the highest complexity. However, other approaches consider 
complexity as a balance between chaos and order. This balance 
is desirable for computing and living systems, since they 
require certain stability but also certain variability. The ordered 
extreme is robust, but does not enable adaptation. The chaotic 
extreme allows for adaptation and exploration, but information 
cannot be lost. In this approach, complexity is seen as a 
balance between propagating and transforming information 
E). 

The paper is organized as follows: In the following section, 
related work concerning ULS evolutionary systems is men- 



tioned. In section III the proposed methodology is presented. 



In section IV the case study of an ULS peer-to-peer computing 
system with distributed genetic adaptation is used as an 
illustration. In section [VJ results of the case study evaluation 
carried out by means of discrete event simulation are shown. 
Section [VI] concludes the paper with a summary of achieved 
results and an outline of future work. 

II. Related Work 

The SEI study [17| on ULS systems brings together experts 
in software and other fields to examine the consequences 
of rapidly increasing scale in software-reliant systems. The 
report details a broad, multi-disciplinary research agenda for 
developing the ultra-large-scale systems of the future, that also 
include computer-supported evolution, adaptable structure and 
emergent qualities. 

All these aspects can be placed under the umbrella of auto- 
nomic computing, that proposes to provide distributed systems 
with four key properties: self-configuration, self-healing, self- 
optimization, and self-protection [15|. IBM has suggested a 
reference model for autonomic control loops, which is some- 
times called the MAPE-K (Monitor, Analyze, Plan, Execute, 
Knowledge) loop [14|. This model is being widely used to 
communicate the architectural aspects of autonomic systems. 
The MAPE-K construction can be iterated, as the control loop 
itself could be adaptive (see for example [16| and |5|). 

With respect to evolutionary adaptation, Hales introduced 
an algorithm called SLAC, which means selfish link and 
behavior adaptation to produce cooperation 1131 . SLAC is 
based on the copy and rewire approach, whose basic algorithm 
assumes that peer nodes have the freedom to change the way 
they handle and dispatch requests to and from other nodes, 
and drop and make links to nodes they know about. Another 
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interesting approach for node restructuring has been proposed 
by Tyson et al. EOl . that show how survival of the fittest 
has been implemented into the Juno middleware. On receipt 
of a superior component, Juno dynamically reconfigures the 
internal architecture of the node, by replacing the existing 
component with the new one. The Distributed Remodeling 
Framework (DRF) Q is a general approach for the design of 
efficient environment-driven peer-to-peer networks. Thanks to 
the DRF, the modifications of the load on the whole system 
trigger reconfigurations at the level of single peers, from 
which global system reconfiguration quickly emerges without 
a centralized control. 

III. Methodology 

The ULS systems we want to characterize are made of 
several thousands nodes that interact in a peer-to-peer fashion, 
i.e. without a centralized control. Each peer has a modular 
structure that can dynamically change, by adding or removing 
components. Being p the total number of component types for 
the system, we define a vector M of q < p components, called 
the model of the peer. An adaptive plan r produces a sequence 
of configurations, i.e. a trajectory through the search space M. 
of models, following an evolutionary process. Further details 
— not necessary for the purposes of this paper — are given 
in E). 

By means the framework introduced by Gershenson and 
Fernandez in (5), we measure the evolution over time of the 
information I associated to M, when the components of the 
latter are integer values x £ [l..n] used as parameters for 
the node's functional processes. Taking into account the latest 
W configurations of M, we measure the frequency of each 
value for x € [l..n]. Then we use the frequency as P(x) and 
compute the normalized information as 



/ = 



-J2P(x)\og 2 P(x) 



(1) 



where I 6 [0,1] and I max = — log 2 l/n, since the 
maximum information value is achieved when all values 1, .., n 
have the same probability [9|. Minimum information (I = 0) 
occurs when only one value is repeated in time. This implies 
that the node is static, i.e. there is no change. 

For example, if the latest W = 3 configurations of M 
have been (1, 3, 5), (1, 3, 6), (1, 4, 6), then the frequencies are 
P(l) = 1/3, P(2) = 0, P(3) = P(6) = 2/9, P(4) = P(5) = 
1/9. The associated information is 

-l/31og 2 1/3 - 4/91og 2 2/9 - 2/91og 2 1/9 

1 = i 77? — u - s;3 

- log 2 1/6 

Since peers are randomly initialized, we assume a random 
"input" and use the following simplified formulas [|9l : 



emergence E = I 



(2) 



Higher emergence implies that the process produces more 
information. Low emergence implies that the process does not 
produce information, i.e. I = 0. 



self-organization 5 = 1 — 7 



(3) 



In this simplified equation, self-organization can be seen as 
the opposite of emergence. Low emergence implies high self- 
organization and vice versa. The most self-organized process 
is that which is static, while the least self-organized process 
is that which changes the most, i.e. it is the most emergent 
(/ = 1). 



complexity C = 4 ■ E ■ S 



(4) 



where the 4 factor is for normalization reasons, since 
I G [0, 1] and C is maximized when I = E = S — 0.5. 
Complexity here is seen as a balance between emergence 
(change) and self-organization (order). Complexity is low 
when E or S are high, while it is maximal when they are 
equal. 



homeostasis H = 1 — d(I, Ii,, 



(5) 



where d is the normalized Hamming distance between 
the M associated to I and l; m iu which is the information 
associated to the initial configuration of the peer. H is a 
complementary measure of change in the system. The highest 
H = 1 is given when there is no change, H w l/n indicates 
lack of correlation between compared states. 

IV. Case study 

We use the example of an ultra-large-scale computing 
system where nodes form a peer-to-peer overlay network, 
whose global functioning is the result of the interactions 
among nodes, and depends on their local evolution based on 
an adaptive plan implemented as a genetic algorithm (GA). In 
the considered system, shared resources cannot be acquired 
(by replication) once discovered, but may only be directly 
used upon contracting with their hosts. An example of such 
resources is disk space, which can be partitioned and allocated 
to requestors for the duration of a task, or in general for an 
arranged time. 

The topology of the envisioned ultra-large-scale system can 
be represented as an undirected graph whose arcs stand for 
mutual knowledge among peers. The node degree k is the 
number of neighbors of each node. Its statistical distribution 
depends on the history and on the dynamics of the network, 
and may affect the performance of the distributed algorithms 
which are executed. 

Every peer executes a resource lookup algorithm based 
on epidemic query propagation Q. In details, peers generate 
lookup messages that are matched with local resources and if 
necessary propagated to neighbors. Each peer has a cache that 
contains the descriptors of remote resources that have been 
previously discovered. Such a cache is used before sending 
random (i.e. blind) queries. Every lookup message has a time- 
to-live (T), representing the remaining number of hops before 
the message itself expires — i.e. it is no longer propagated. 
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Moreover, every peer caches received queries, in order to drop 
subsequent duplicates. 

The resource discovery process is affected by three param- 
eters: 

• fk = fraction of neighbors targeted for query propagation 

• T max = propagations depth, i.e. the maximum number of 
times a query is forwarded by peers before it is removed 
from the network 

• D max = maximum size of the cache 

through phenotype of each peer, whose most simple form is: 

$ = G(fk,T max ,D max ) = <p f k + (j>lT max + 02 Anas (6) 

where constant weights 4>q,4>i,4>2 G K control the influence 
of each parameter. 

In traditional epidemic algorithms, parameters have fixed 
values, set in advance according to the results of some tuning 
process which should guarantee good performance with high 
probability. Actually, if all peers would be configured with 
fk = 1 and the same T value, the resulting scheme would be 
exactly the Gnutella protocol lfl2l . In our scheme, parameters 
are functions of the chromosome, thus randomly initialized 
when a peer is created and joins the network. Moreover, the 
adaptive tuning of parameters is based on a GA with a fitness 
function F to be minimized, that depends on two parameters. 
The first one is the average query hit ratio 



(QHR) = AQ(QHR , .., QHR k ) = 



1 



k 



k 



QHR. (7) 



The query hit ratio is the number of query hits QH, i.e. 
successful lookups, versus the number of submitted queries 
Q. In the previous equation, we assume that j = for the 
considered peer, and j = 1, .., k for its neighbors. 

All peers are characterized by a genotype M, defined by 
three genes {Mo, Mi, M2} with values in a limited subset of 
N. The mapping between the phenotype and genotype is given 
by the following equations: 

• fk = c M 

• T max = cxM\ 

• D max = C2M2 

where a € R, (with i = 0, 1, 2) are constants. 



A. Fitness functions 

Within the set of fitness functions we defined in 0, we 
chose the following: 



F 2 (<t>,(QHR)) 
F 4 ($,(QHR)) 



(1 



(QHR))~ 



(QHR)® 



(QHR) 
Th 



!)(#-) + (! 



(QHR)) 



where Th is a threshold value and $m = max{&}. 

Both fitness functions reward lower phenotypes when the 
average query hit ratio (QHR) is high, and higher ones when 
the (QHR) is low. F4 allows to set a threshold Th for 
(QHR), under which the Af, are forced to grow. Conversely, 
when (QHR) > Th the Mj are forced to decrease. F2 does 
not allow to set a threshold for (QHR). 



B. Adaptive plan 

The block diagram in Fig. [2] illustrates how each peer works. 
The resource lookup is executed in a separate thread with 
respect to the adaptation process. However, they influence each 
other. 
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Fig. 2. Functional architecture of the peer. 



Algorithm 1 Adaptation 



let g = 

while not converged do 

evaluate the fitness of own model 
9 = 9 + 1 

select neighbor with best fitting model 

perform cross-over with best neighbor to generate offspring O g 

{dgl, O g2 } 

mutate offspring in O g with probability 1 — (QHR) 

select the new generation JVl g from the previous generation Ai g - 

and the offspring O g 

end while 



The pseudocode in Algorithm [T] describes the adaptive plan 
which is periodically executed by each peer. The best neighbor 
is chosen using proportional selection, i.e. the chance of the 
neighbors' models to be selected is inversely proportional to 
their fitness values. Cross-over is always performed, with a 
randomly-generated crosspoint. Mutation depends on (QHR), 
being highly improbable while the average query hit ratio 
of the peer and its neighbors tends to 1. The final selection 
between current peer's chromosome and the mutated offspring 
is random, with probability that is inversely proportional to 
their fitness values. 

V. Simulations 

To evaluate the performance of the proposed ultra-large- 
scale system, with and without GA-based adaptation, we 
used a general-purpose discrete event simulation environment, 
called DEUS, based on Java and XML, released as open source 
under the GPL license [1 1. Such a tool has been used instead 
of classical network simulators (e.g., ns-2 or Opnet), since 
it is optimized to analyze P2P networks at a higher level. In 
particular, with DEUS it is possible to simulate highly dynamic 



Fig. 1. Plot of F2 and F4 with the parameters' values of the simulation experiments. 



overlays (i.e. application level) networks, with several hundred 
thousands nodes, on a single machine — without the need 
to simulate also lower network layers (whose effect can be 
taken into account, in any case, when defining the virtual time 
scheduling of message propagation events). 

Each simulated peer was characterized by three kinds of 
consumable resources: CPU, RAM, and disk space. Their 
values were randomly generated (with uniform distribution) 
as multiple of, respectively, 512 MHz, 256 MB, and 10 GB. 
The maximum amount of resources per peer was 2 GHz, 1 
GB for the RAM, and 100 GB for the disk space. 

We assumed Mi 6 [1--6] (n — 6), and 

. f k = M /6 

• T max = Mi 

• D max = 2M2 

The phenotype $ is given by eq.|6] with <j) — 100, <f>\ = 10, 
and 4>2 — 5, for which fk has more importance than T max and 
D m ax in the computation of the fitness value. The average 
query hit ratio {QHR) is given by eq. [7] We assumed that 
each peer at the beginning has QHR = 0.5. 

All the experiments were carried out with a simulated P2P 
network of N — 10000 nodes, with a topology randomly 
constructed without preferential attachment, starting from N 
completely connected nodes, and each other peer being at- 
tached to m € [l,A*o] existing peers, with even probability. 
All connections are bidirectional, i.e. if node n a has node rib 
in its peerview, then n& has n a in its peerview. The resulting 
degree distribution is exponential J4|: 

P{k) = (1 - e^-)e (1 ^™ ) Vfc > m 
with expected value 



E(k) = kP{k) = - 

7, ™ -L 



time, for each considered fitness function. We considered two 
scenarios: stable load and variable load. 

In the first scenario, we simulated 150 minutes of the 
life of a network of peers, with resource queries occurring 
continuously and independently of one another — according 
to a Poisson process with rate A = 0.028 s _1 , i.e. 36 queries 
every second, each one associated to a randomly chosen node. 
Each query is a request for a randomly generated amount of 
resources (no more than 2 GHz of CPU, 1 GB of RAM, and 
100 GB of disk space, respectively). Once a resource has been 
found, it is consumed for a random time interval (according 
to an exponential distribution with mean value 280 s — for 
which the system's utilization would be 1, if a provider was 
found for every request). 

In the second scenario, we still simulated 150 minutes of 
the life of a network of peers, but we considered a load on 
system changing over time. In the first 50 minutes, almost all 
resource queries ask for 2 GHz, 1 GB of RAM, and 100 GB of 
disk space (like in previous experiments). In the following 100 
minutes the load is reduced, with almost all resource queries 
asking for 256 MHz, 128 MB of RAM, and 1 GB of disk 
space. 

We measured also the evolution over time of the information 
I associated to the current genotype M, according to the 



procedure illustrated in section III Every 10 virtual minutes 



In these experiments, we used m = 3 and iVo = 5. 
Every node performs adaptation every 7 seconds. Figures 
[3] to [6] illustrate the average evolution of M and QHR over 



the simulator executes a log event, where peers that have 
performed at least one lookup are considered for computing 
the mean value and standard deviation of MjVi G [0,2], QHR, 
E, S, C and H. Figures [7] to 10 are related to the case of 
W = 1. 

For the stable load scenario, F2 produces balanced change 
in time, reflected in medium E and S and a high C. F4 offers 
less change on average, as seen by a higher S and H, lower E, 
and medium C. Still, the rate of information change changes 
considerably in time, leading to a variance in the measures, 
especially high for C. 

For the variable load scenario, F 2 performs in a similar 
fashion: it is not able to adapt to the reduction of the demand. 
Conversely, F4 does adapt at the demand change at minute 
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Fig. 3. Average Mo, Mi, M2 and QHR over time, for fitness function F2. The load on the system is stable. 
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Fig. 4. Average Mq, Mi, M2 and QHR over time, for fitness function F4. The load on the system is stable. 



50, stabilizing at minute 80. This is seen in maximal S and plans lead to a more stable system, but the optimal perfor- 
H and minimal E and C, indicating that there is no change mance may not be achieved, 
in the nodes. This is because of the threshold used in F4. 



VI. Conclusion 

We presented simulation results of an ULS computing 
system with two different types of genetic adaptation while 
measuring their complexity, emergence, self-organization and 
homeostasis. The main result is that less "aggressive" adaptive 



Future work will follow two main directions. Firstly, with 
respect to the use case proposed in this paper, we will study the 
impact of parameter variations and we will consider also other 
workload profiles. Then, we will generalize the analysis to 
the case of systems with components that dynamically change 
their structure, by removing or adding components. 
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Fig. 5. Average Mq, Mi, M2 and QHR over time, for fitness function F2. The load on the system changes at minute 50. 
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Fig. 6. Average Mq, Mi, M2 and QHR. over time, for fitness function F4. The load on the system changes at minute 50. 
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Adaptation with F 4 



Adaptation with F 4 




20 40 60 80 100 120 140 16 
t [minutes] 
Adaptation with F 4 




20 40 60 80 100 120 140 160 
t [minutes] 




20 40 60 80 100 120 140 160 
t [minutes] 
Adaptation with F 4 




20 40 60 80 100 120 140 160 
t [minutes] 



Fig. 10. Average E, S, C and H over time, for fitness function F4. The load on the system changes at minute 50. 
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